Agent-Based Distributed Learning Applied to Fraud Detection

نویسندگان

  • Andreas L. Prodromidis
  • Salvatore Stolfo
چکیده

Inductive learning and classification techniques have been applied in many problems in diverse areas. In this paper we describe an AI-based approach that combines inductive learning algorithms and meta-learning methods as a means to compute accurate classification models for detecting electronic fraud. Inductive learning algorithms are used to compute detectors of anomalous or errant behavior over inherently distributed data sets and meta-learning methods integrate their collective knowledge into higher level classification models or meta-classifiers. By supporting the exchange of models or classifier agents among data sites, our approach facilitates the cooperation between financial organizations and provides unified and cross-institution protection mechanisms against fraudulent transactions. Through experiments performed on actual credit card transaction data supplied by two different financial institutions, we evaluate this approach and we demonstrate its utility. Introduction Agent-based learning systems have attracted considerable attention recently. One such system, JAM (Java Agents for Meta-learning) (Stolfo et al. 1997), provides the means of dispatching and executing learning agents at remote database sites, with each learning agent being a Java encapsulated machine learning program. One of JAM ’s key features is meta-learning, a general technique that combines multiple classification models, each of which may have been computed over distributed sites. In this paper, we describe the application of JAM in fraud and intrusion detection in network-based information systems by detailing a comprehensive set of experiments in the real-world application of credit card fraud. Several key new teachniques are reported that address the issues of pruning combined meta-classifiers in order to improve efficiency while maintaining accuracy. Furthermore, experiments are reported on combining classifiers computed over distributed databases with different schemas. We begin with a brief overview of the fraud detection application to highlight the advantages of JAM in distributed learning. Fraud detection A secured and trusted interbanking network for electronic commerce requires high speed verification and authentication mechanisms that allow legitimate users easy access to conduct their business, while thwarting fraudulent transaction attempts by others. Fraudulent electronic transactions are a significant problem, one that grows in importance as the number of access points increase and more services are provided. The traditional way to defend financial information system has been to protect the routers and network infrastructure. Furthermore, to intercept intrusions and fraudulent transactions that inevitably leak through, financial institutions have developed custom fraud detection systems targeted to their own asset bases. Recently however, banks have come to realize that a unified, global approach that involves the periodic sharing of information regarding fraudulent practices is required. In this paper, we describe an AI-based approach that supports the cooperation among different institutions and consists of pattern-directed inference systems that use models of anomalous or errant transaction behaviors to forewarn of fraudulent practices. This approach requires the analysis of large and inherently distributed databases of information about transaction behaviors to produce models of “probably fraudulent” transactions. The key difficulties in this approach are: financial companies don’t share their data for a number of (competitive and legal) reasons; the databases that companies maintain on transaction behavior are huge and growing rapidly; real-time analysis is highly desirable to update models when new events are detected and easy distribution of models in a networked environment is essential to maintain up to date detection capability. To address these difficulties and thereby protect against electronic fraud our approach has two key component technologies: local fraud detection agents that learn how to detect fraud within a single information system, and an integrated meta-learning mechanism that combines the collective knowledge acquired by the An alternative approach to modeling transactions would be to model user behavior. An application of this method, but in cellular phone fraud detection has been examined in (Fawcett & Provost 1997). individual local agents. The fraud detection agents consist of classification models computed by machine learning programs at one or more sites, while meta-learning provides the means to combining a number of separately learned classifiers. Thus, meta-learning allows financial institutions to share their models of fraudulent transactions without disclosing their proprietary data. This way their competitive and legal restrictions can be met, but they can still share information. Furthermore, by supporting the training of classifiers over distributed databases, meta-learning can substantially reduce the total learning time (parallel learning of classifiers over (smaller) subsets of data). The final meta-classifiers (the combined ensemble of fraud detectors) can be used as sentries forewarning of possible fraud by inspecting and classifying each incoming transaction. This paper presents a comprehensive set of experiments evaluating the applicability of our approach in the security of financial information systems. As a test set we use a data set of credit card transactions supplied by two different financial institutions. The task is to compute classification models that accurately discern fraudulent credit card transactions. Our experiments are structured as follows. First we apply several machine learning algorithms on different subsets of data from both banks to establish the potential of inductive learning methods in fraud detection. Then, we overview meta-learning and evaluate its utility by combining the fraud detectors of each bank. In the last part of the paper, we describe the exchange of classifiers between the two banks and provide empirical results for assessing the validity and merit of this approach. By way of summary, we find that pattern-directed inference systems coupled with meta-learning methods constitute a protective shield against fraud with the potential to exceed the performance of existing fraud detection techniques. Computing fraud detectors Machine Learning In this study we employ five different inductive learning programs, Bayes, C4.5, ID3, CART and Ripper. Bayes, implements a naive Bayesian learning algorithm described in (Minksy & Papert 1969), ID3 (Quinlan 1986), its successor C4.5 (Quinlan 1993), and CART (Breiman et al. 1984) are decision tree based algorithms, and Ripper (Cohen 1995) is a rule induction algorithm. Data sets We obtained two large databases from Chase and First Union banks, each with 500,000 records of credit card transaction data spanning one year (Oct.95-Sept.96). Chase bank data consisted, on average, of 42,000 sampled credit card transactions records per month with a 20% fraud and 80% legitimate distribution, whereas First Union data were sampled in a non-uniform (many records from some months, very few from others, very skewed fraud distributions for some months) manner with a total of 15% versus 85% distribution. The schemas of the databases was developed over years of experience and continuous analysis by bank personnel to capture important information for fraud detection. The records have a fixed length of 137 bytes each and about 30 numeric and categorical attributes including the binary class label (fraud/legitimate transaction). Under the terms of the non-disclosure agreement, we can not reveal the details of the schema beyond the following general description: • A (jumbled) account number (no real identifiers) • Scores produced by a COTS authorization/detection system • Date/Time of transaction • Past payment information of the transactor • Amount of transaction • Geographic information: where the transaction was initiated, the location of the merchant and transactor • Codes for validity and manner of entry of the transaction • An industry standard code for the type of merchant • A code for other recent “non-monetary” transaction types by transactor • The age of the account and the card • Other card/account information • Confidential/Proprietary Fields (other potential indicators)

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MEFUASN: A Helpful Method to Extract Features using Analyzing Social Network for Fraud Detection

Fraud detection is one of the ways to cope with damages associated with fraudulent activities that have become common due to the rapid development of the Internet and electronic business. There is a need to propose methods to detect fraud accurately and fast. To achieve to accuracy, fraud detection methods need to consider both kind of features, features based on user level and features based o...

متن کامل

Fraud Detection of Credit Cards Using Neuro-fuzzy Approach Based on TLBO and PSO Algorithms

The aim of this paper is to detect bank credit cards related frauds. The large amount of data and their similarity lead to a time consuming and low accurate separation of healthy and unhealthy samples behavior, by using traditional classifications. Therefore in this study, the Adaptive Neuro-Fuzzy Inference System (ANFIS) is used in order to reach a more efficient and accurate algorithm. By com...

متن کامل

Fast Unsupervised Automobile Insurance Fraud Detection Based on Spectral Ranking of Anomalies

Collecting insurance fraud samples is costly and if performed manually is very time consuming. This issue suggests usage of unsupervised models. One of the accurate methods in this regards is Spectral Ranking of Anomalies (SRA) that is shown to work better than other methods for auto insurance fraud detection specifically. However, this approach is not scalable to large samples and is not appro...

متن کامل

Credit Card Fraud Detection using Data mining and Statistical Methods

Due to today’s advancement in technology and businesses, fraud detection has become a critical component of financial transactions. Considering vast amounts of data in large datasets, it becomes more difficult to detect fraud transactions manually. In this research, we propose a combined method using both data mining and statistical tasks, utilizing feature selection, resampling and cost-...

متن کامل

A hybrid model based on machine learning and genetic algorithm for detecting fraud in financial statements

Financial statement fraud has increasingly become a serious problem for business, government, and investors. In fact, this threatens the reliability of capital markets, corporate heads, and even the audit profession. Auditors in particular face their apparent inability to detect large-scale fraud, and there are various ways to identify this problem. In order to identify this problem, the majori...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999